Goto

Collaborating Authors

 image png


NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Pandya, Pranshu, Talwarr, Agney S, Gupta, Vatsal, Kataria, Tushar, Gupta, Vivek, Roth, Dan

arXiv.org Artificial Intelligence

Cognitive textual and visual reasoning tasks, such as puzzles, series, and analogies, demand the ability to quickly reason, decipher, and evaluate patterns both textually and spatially. While LLMs and VLMs, through extensive training on large amounts of human-curated data, have attained a high level of pseudo-human intelligence in some common sense reasoning tasks, they still struggle with more complex reasoning tasks that require cognitive understanding. In this work, we introduce a new dataset, NTSEBench, designed to evaluate the cognitive multi-modal reasoning and problem-solving skills of large models. The dataset comprises 2,728 multiple-choice questions comprising of a total of 4,642 images across 26 categories sampled from the NTSE examination conducted nationwide in India, featuring both visual and textual general aptitude questions that do not rely on rote learning. We establish baselines on the dataset using state-of-the-art LLMs and VLMs. To facilitate a comparison between open source and propriety models, we propose four distinct modeling strategies to handle different modalities (text and images) in the dataset instances.


Plot2txt for quantitative image analysis

@machinelearnbot

In recent times, computation has become both pervasive and less constrained by Moore's Law. This is due in large part to the emergence of cloud computing and the rise of massive parallelism. The former has benefited from network improvements and ever increasing connectedness, the latter from the appropriation of hardware like Graphics Processing Units (GPUs) for general purpose computing. This computational leap, coupled with the process of disintermediation [1] taking place around the globe will continue to support revolutions like artificial intelligence (AI), as many have remarked. AI has a long and interesting history.